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of music were used in this experiment. The multimodal fusion is tested using 
a six-performance measurement method. The purposed multimodal 
parameter shows the highest accuracy is 97.68%. The sensitivity of this 
study presents over 95% and the high value for specificity is 89.5%. The 
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This is an open access article under the CC BY-SA license. 
(onolol 
Corresponding Author: 
Zarith Liyana Zahari 


Faculty of Electrical and Electronic Engineering Technology, Kampus Pekan, Universiti Malaysia Pahang 
26600 Pekan, Pahang, Malaysia 
Email: zarithliyana @unikl.edu.my 


1. INTRODUCTION 

The modalities’ understanding is described as a consideration process from separation of data that 
can be visible for one modality while blind to others. From the origin, separate unimodal data, then the 
multimodal will gather information into extracted features and decision part [1]. The multimodal also will be 
divided into three-stage which is early, intermediate, and late stage. Early-stage is a combination of different 
feature modalities into a single group before passing the learning phase. The solution to improve these 
imperfect data functions by categorizing the classification during intermediate stage. The modalities of 
semantic information can be decoded at the decision level which occurs during the late-stage category [2]. 
The multimodal approach measured the stress condition in an impaired mobility data such as the brain 
response and peripheral bio-signals reported by Kalimeri and Saitis [3]. The construct of hate speech 
classification in social media based on text and photo will be fragmented using multi classes of fusion groups 
in the deep multimodal fusion technique [4]. During the late stage, the emotional responses by using fractal 
dimension (FD) features approach is estimated to find the effect of weighting factors in 


Journal homepage: http://ijai.iaescore.com 


Int J Artif Intell ISSN: 2252-8938 o 415 


electroencephalogram (EEG) signal contribution [5]. Therefore, the EEG signal executes the imperfect data 
result of reliability and asynchronous features among different modalities; and become a familiar signal 
method in exploit multimodal function. Similar work has also been pursued in this similar research area in 
order to classify the human stress level using an EEG signal. Here, self-stress questionnaire assessment and 
music application is utilized as the basis for our initial works. 

Sometimes, the modalities’ functions may omit certain imperfect modalities’ parameter of the EEG 
signal by ignored or uncounted them which obstructed the path in obtaining an excellent performance 
measurement. This issue is found as a crucial limitation contains in this functional approach. Consequently, 
the multimodal function was used in this study in order to solve and improve the function by considering 
poor or unbalanced data such as the EEG signals. The idea is to enhance the multimodal data in EEG data by 
improving some numerical value in the multimodal (1), as mentioned in the equation. 


Drmiitimodal Fy ADeeg + (1 -_ @)Drmste (1) 


In medical and bio-signal applications, the EEG is a part of the electrophysiological signal. The 
electrophysiological signal from brain waves can be monitored, measured, and analyzed for the 
characteristics in the EEG frequency range. This approach used the international 10/20 system of electrode 
placement to relocate the electrode channel position via relative distance for monitoring, measurement, and 
recording the EEG signal properly. In the example, the recognition of high arousal degree in the EEG signal 
for the intention complex task used 14 channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, 
AF4) are collected by Emotiv headband [6]. In signal processing, a bandpass filter in the EEG signal 
processing analysis is practically applied to obtain the EEG frequency range from 0.01 Hz until 100 Hz with 
an approximate amplitude of 100 pV [7]. Then the comprehensive features of EEG signals were extracted 
into various connectivity functions based on primary frequency bands which are delta, theta, alpha, beta, and 
gamma [8]. Similar with Al-shargie et al. works [9], applied the frequency band range from 0.5-30 Hz to 
measure the EEG signal in multilevel mental stress assessment. 

Recent investigations have demonstrated that the primary EEG signal frequency bands are 
compatible to implement in signal processing. Moreover, the feature extraction technique can rely on the 
signal processing domain analysis which consists of time domain, frequency domain, and time-frequency 
domain. EEG signal should be seen in spectral mode and analyzed using power spectral density (PSD) for 
frequency domain analysis. The PSD estimated the power and coherence value in high-frequency resolution 
of the frequency range corresponding to the EEG signal [10]. Moreover, PSD can find the power peak value 
and mean value [11], standard deviation, skew, kurtosis, and root means square (RMS) [12]. The Time- 
frequency domain is the relation between the time and frequency domain. Therefore, the Time-frequency 
domain can be represented as the combination of time and frequency into single features function. The idea 
of time-frequency is by carried out any EEG signal in respect of frequencies for the time to be realized using 
continuous wavelet transform (CWT) and short time Fourier transform (STFT) methods. The energy 
distribution value from the EEG signal was found in time-frequency domain analysis [13]. Similar findings 
from the frequency domain also found in the time-frequency domain [14], [15]. The approach method in 
time-frequency is widespread likes STFT methods. Many of the features in the EEG signal is derived from 
the STFT methods have been shown by proficient features such as minimum, maximum, and median value 
[16]. Likewise, some researchers agree with the method in finding the entropy, Hjorth mobility, and 
complexity feature which will be calculated from the energy distribution [17], [18]. These common methods 
mention that exploited from the EEG signal can be used as the EEG probability parameter in this research. 

Aside from the EEG probability parameter, there is a slew of other variables to consider which 
include weighting and application factors. The stress level was employed as a weighting element, while 
music was used as an application factor. Due to both elements may be stated in numerical terms, the stress 
level and music application were chosen as multimodal parameters for this study. The manifest of stress is 
recognized when there is an interruption which caused some pressure to mental and physical to human that 
can lead to a beneficial or non-beneficial to the human body. If the human is able to manage the stress and 
the external pressure for improvement and gains, then this stress will be considered as good stress level due 
to the beneficial in overcoming the stress factors. On the other hand, the non-beneficial stress mainly causes a 
depression, mental problem and critical illness. Therefore, the self-stress assessment is required to identify 
the stress level by answering a stress questionnaire. Moreover, recognising one’s stress level can assist 
people in managing stress before it worsens and leads to a protracted stress state. The self-stress assessment 
methods are potentially of questionable accuracy and reproducibility to be measured, proved by works of 
[19], [20]. The study by Olaitan [21] used the self-stress assessment with the clinical assessment, and identify 
the hypertension of health workers. This shows that the several of physiological assessment and self-stress 
assessment can be implemented in stress level measurement. Hence, in this research, the physiological 
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measurement and a self-stress assessment questionnaire were applied to obtain the stress level classification. 
The result will be performed as the weighting factor for this research in multimodal function. 

Several studies such as Paszkiel et al. [22] explored the impact of music applications on stress and 
identified the stress level through music. Conceptually similar works have also been carried out in [23] which 
it can observe the effectiveness of mental and stress condition. In addition, application of music can reduce 
stress in cancer patient because music can be a part of therapy treatment [24]. This is generally accepted in 
the literatures from which it can be concluded that the music application can be considered as a part of 
component in multimodal in order to determine the human stress level. Therefore, various types of music are 
utilized to examine the effects of stress levels such as low rhymes and high rhymes.The contribution of this 
studying this paper is summarized as: i) the proposed of EEG features extraction as EEG probability 
parameter based on low rhymes and high rhymes, and ii) to enhance the multimodal function equation in 
order to determine the stress level classification by using the physiological measurements and a self- 
assessment stress questionnaire. 


2. RESEARCH METHOD 

The method proposed to achieve the study aims are contained in four major steps as shown in 
Figure 1. The figure shows the four major steps are self-assessment, signal processing, features extraction, 
multimodal performance. The research method began by self-stress assessment which includes music and the 
EEG signal measurement processes. Next, the signal is processed and the features being extracted to arrive at 
the multimodal performance. The signal processing can be explained as the technique to gain a standard EEG 
signal from the raw EEG signal, while features extraction is described as features applied in the multimodal 
function. The multimodal performance was presented as the best measurement result achieved based on 
multimodal parameter application. 


Self-stress Si gnal Features ; Multimodal 
assesment processing extraction 


EEG signal 


Figure 1. The structure of multimodal 


2.1. Self-stress assessment 

The self-stress assessment is tested using a questionnaire format and measured based on the 
quantities’ scale provided by the International Stress Management Association UK (ISMAUK). The scales of 
ISMA stress questionnaire show in three stress classes which is low stress (0), medium stress (1) and high 
stress (2). These three stress classes are categorized based on the total score value from the question answer. 
In calculation technique, yes answers contribute to 1 score and no answers equal 0 score; which means that 
the question answer is derived purely based on the total number of yes. The score will then be added together 
to determine the subjects’ stress classes. The score value and stress level are described in Table 1. 


Table 1. Level of the stress score value and the conditions 


Total Score Stress classes Description 
0-4 0 Mild stress-related illness. 
5-13 1 Medium experience stress 
14-25 2 Medium experience stress leads to creating unhealthy behaviours 


2.2. Music 

Music is performed using voice vocal or sound to present the harmony, melody, or rhythm. Music is 
very wide to be described but can be categorized in empathizing and systemizing music described by [25]. 
The sharing and understanding about other people feeling can be expressed in the lyric, video clips or music 
rhythms. These mechanisms influence people while listening to the music. On the other hand, there is no 
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noticeable influence that there is an impact of reaction to the style of music based on gender [26]. However, 
the selection of music is still important and become a critical part during collecting the EEG signal because it 
causes an impact on the brain waves. In this research, it used a different of music types which are low rhymes 
and high rhymes. Low rhymes for the baby rhythm song and high rhymes for pop-punk song. Pop-punk is 
categorizes as systemizing type because it presents in intense dimension; while baby rhythm is a relaxing 
music and can be classified in empathizing type. The selection of this music due to it contains an empathizing 
and systemizing in the music which easy to identify the EEG signal performance. 


2.3. EEG signal 

In this section describes about the experimental procedure in the EEG signal data collection. The 
experiment procedure to the subject is operated under ethic identity number IREC 2021-039 approved by 
IITUM Research Ethics Committee (IREC). The subject must have good health and hearing, not taking any 
prescription medicine (include antidepressants) and is not pregnant. In addition, subject is required to 
understand, read and write in English language. Next, thirty-four subjects which were involved in this 
experiment data collection have equal number of genders. The standardized period of this experiment was 
conducted from 9 AM to 5 PM. Each of the experiment will take 20 minutes and all the process detail shown 
in Figure 2. 

In the beginning of the data collection, subject is located in the quiet room and was asked to be 
relaxed while sitting on the chair. Then, subject needs to answer all self- stress assessment questionnaires. 
After that, subject was asked to have little movement as possible until the experiment is completed. EEG 
Emotiv EPOC+headset is used for capturing the EEG signal in the offline mode. The advantage of using this 
headset is that the headset is a wireless device and able to connect to EmotivPro software for recording the 
EEG signal without any interruption. In this research, ten targeted channels (AF3, F7, F3, FC5, T7, AF4, F4, 
F8, FC6, and T8) be applied. Earphone device is used during listening the music. 


Relax : ‘ Standby mode to 
Self-stress ai Standby mode Music 1 Standby mode Music 2 7 
[t—— | condition _-—— : > : -—} ‘ -— é t— | _ end process 
assesment ; (2 min) (4 min) (2 min) (4 min) : 
(4 min) (2 min) 


Figure 2. Data collection process for music application 


2.4. Signal processing 

Multiple signals were produced during capturing and recording the EEG signal. Therefore, the 
purposed of signal processing step is to ensure that the undesired and interrupted signal is eliminated. The 
process of removing the unwanted signal is required in this research in order to obtain the standard EEG 
signal in the range of -100 nV to 100 uV in frequency domain. The band-pass filter technique approach is 
used to gain the standard EEG signal. This approach used is based on the works of [27], [28]. Then, the EEG 
frequency bands of delta, theta, alpha and beta will be separated to the targeted frequency range in this 
research which are (0 Hz to 4 Hz), (4 Hz to 7 Hz), (8 Hz to 12 Hz) and (12 Hz to 25 Hz) respectively. The 
process of signal processing is expressed in Figure 3. 


Frequency band selection 


(Delta, Theta, Alpha and Beta) 


Figure 3. The block diagram for the EEG signal processing 


2.5. Features extraction 

The extraction of features is a method to gain the specific features of the EEG signal. The frequency 
and time-frequency domain analysis are the most conventional and standard practice for analyzing the EEG 
signals. These domains analysis show the potential features in each of frequency band used for this research. 
For the frequency domain analysis used PSD, while for time-frequency domain used STFT and CWT. The 
purposed of using these two domains is to compare the features performance in EEG signal. This technique is 
supported by Rahman et al. [28] that applied in their research to determine the features in EEG signal. 
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2.5.1. Power spectral density (PSD) 

In signal processing, the common practice of the PSD technique is the conversion of the time 
domain to the frequency domain. Nevertheless, it could identify the power value in minimum, maximum and 
average power values. The familiar function as a feature extraction is being used in EEG signals are mean, 
standard deviation, RMS, standard deviation error, median, mean deviation, and coefficient of mean 
deviation. Skew, variance, and kurtosis are a part of the statistical features function is used. The extension of 
statistical features is calculated in power relative and entropy. The purposed of power relative is added into 
the features to determine the highest power concentration of a specific frequency range. For entropy, an 
algorithm is used in analyzing the EEG signal. The justification of using entropy in this paper is to identify 
the probability density function of power value in the EEG signal. In addition, magnitude square coherence 
and magnitude cross power spectral density can also be applied as a feature in the frequency domain. 


2.5.2. Short time Fourier transform (STFT) 

The STFT technique was performed to analyse the sinusoidal signal frequency in the EEG signal. 
The EEG signal is represented in sinusoidal waves with the content of complex exponential data is 
commonly related to the transformation domain. Therefore, it can measure and analyse the variety of features 
in the data signal. The determination of various features will contribute to improve the function of signal 
representation. This paper found the energy value from the EEG signal, and used it to measure the minimum, 
maximum, and average features. Moreover, the energy value to determine the mean, standard deviation, 
RMS, standard deviation error, median, mean deviation, coefficient of mean deviation, variance, skew, and 
kurtosis as the function of a feature. Besides, the advanced mathematical features comprise two features 
which are entropy and recursive energy efficiency (REE). The purpose of the additional two features is to 
identify the existing concentration of the energy result in signal processing. 


2.5.3. Continuous wavelet transform (CWT) 

There are various methods to obtain features function in the time-frequency domain such as 
continuous wavelet transform which are used in this paper. The distribution energy in CWT may found the 
peak minimum, maximum and average value. Similar to the STFT method, the energy value can measure the 
mean, standard deviation, RMS, standard deviation error, median, mean deviation, coefficient of mean 
deviation, variance, skew, and kurtosis. All the functions are also counted as feature functions in the CWT 


method. The EEG signal is capable of extracting more features comprises two features which are entropy and 
REE. 


2.6. Multimodal 

The most significant contribution of this study is the enhancement of multimodal parameter function 
by applying EEG features extraction based on music application to identify stress levels between 
physiological data and a self-assessment stress questionnaire. Refer to the (1) and previous explanations 
about the multimodal parameter describe a is the weighted factor, while p Zegis the EEG signal features 
extraction and p *,,,5i- is the music class. The weighting option has three values in total (0,1,2). Three values 
are derived from the stress classes, which are detailed in Table 1. Next, the EEG signal features extraction 
value produced from the features’ extraction result discussed in the features’ extraction section is p 2eg. 
These characteristics are classified into two domains: frequency domain and time-frequency domain. Lastly, 
the music class parameter p *,,,j- is divided into two classes (1,2) based on music types. Therefore, the 
improvement of parameter in multimodal function is measured in various techniques such as accuracy, 
sensitivity, specificity, area under the curve (AUC), Fl score and informedness. This performance of 
measurement technique was present as the significance of the improvement methods. 


3. RESULTS AND DISCUSSION 

In line with the findings of this paper about the ability to enhance the multimodal parameter in the 
EEG signal based on music application for thirty-four subjects. The multimodal parameter contains weighted 
stress classes, EEG signal features, and music categories. The result of the parameter obtained be discussed 
detail in this sections 3.1 to 3.3. 


3.1. Self-stress assessment 

The weighted stress classes can be known from the sum of score by using the calculation method. 
Based on the total score from the seventeen male and seventeen female subjects in a range of nineteen to 
thirty years old can be summarised in the stress classes in percentage as shown in Figure 4. From the figure, 
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it describes that the high percentage score can be found in class 2 compare to other classes. From our finding, 
it shows that no result was obtained from class 0 in self-stress assessment. 35%+45% from total percent of 
stress score present in class 1 proved that this group has experience with stress. However, there is not much 
of a difference between male and female stress groups in terms of percentages for classes 1 and 2. It shows 
that, most people must having a stress experience in medium and high stress level with overall result shows 
33% for class 1 and 67% for class 2. These stress levels were demonstrated that the human have an 
experience of fatigue and pressure situation in order to achieve goals in their life [29]. In a positive opinion, 
stress may encourage people to be more independent, control their emotion, and enhanced creative thinking 
to make a decision. 
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Figure 4. Stress classes in percentage for male and female 


3.2. Features extraction and multimodal 

For EEG signal features extraction is gained from the features extraction method in frequency and 
time-frequency domains, respectively. PSD approach is used to analyze the EEG signal in frequency domain. 
Nevertheless, the EEG signal in time-frequency domain analysis is a process using STFT and CWT 
techniques from nineteen features for both domain analysis. In the frequency domain, features in the power 
value were measured; while in the time-frequency domain, features in the energy distribution value were 
evaluated. The features extraction is subdivided into three conditions which are resting state (RS), pop-punk 
(M1), and baby rhythm (M2) condition. The average feature extraction result for the analysis domain with 
three conditions is present in the Table 2. In both domains, music | or pop-punk was demonstrated as a high 
value of features extraction result compared with resting state and music 2 or baby rhythm. The differences 
of features extraction result caused by the chosen music type. Music 1 is the combination of power pop and 
pop music in a fast rock tempo with melodies and chord progression while music 2 is a low tempo melody 
and known as relaxing music. The relaxing music effect can calm and help the human toward the relaxed 
mental state condition [30]. 


Table 2. The average of features extraction result 


Method Condition Result 
PSD RS 3.875835 Lv 
MI 5.348473 Lv 
M2 4.029015 pv 
CWT RS 3.79981 (J/Hz) 
M1 5.348725 (J/Hz) 
M2 4.867353 (J/Hz) 
STFT RS 3.891119 (J/Hz) 
MI 5.391332 (J/Hz) 
M2 4.783486 (J/Hz) 


3.3. Multimodal 

The contribution of low values was found in features extraction average result before applied the 
multimodal function. Different with relaxing state, human is in relax condition to listen to any music genre. 
The consistency of the subject in relax condition is provable to be found in the low value of feature 
extraction. The multimodal result was performed using a stress class and features extraction parameter value. 
This result shows in the Table 3. However, the observation is made that there is no significant between this 
domain and technique used. 
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Table 3. The average of multimodal 


Method Condition Result 

PSD RS 42.03393 Lv 
MI 46.40736 Lv 
M2 41.05527 pv 

CWT RS 41.93819 (J/Hz) 
MI 43.59026 (J/Hz) 
M2 43.01139 (J/Hz) 

STFT RS 45.49176 (J/Hz) 
MI 40.86589 (J/Hz) 
M2 41.31803 (J/Hz) 


The average of the multimodal result is measured using six performance measurement techniques. 
The six techniques listed as accuracy, sensitivity, specificity, AUC, Fl score, and informedness. The result 
for accuracy, sensitivity, and specificity are shown in Figure 5. The validation of using stress classes and 
average features extraction in multimodal fusion was performed when the highest accuracy is 97.68%. The 
measurement of sensitivity and specificity is to identify the proposition of positive correctly and the 
possibility of an optimal result. The sensitivity of this study presents over 95% and the high value for 
specificity is 89.27%. The underlying of the obtained high percentage of sensitivity and specificity will prove 
that this measurement technique is acceptable to be a potential testing measurement technique in the 
parameter value multimodal result. Futhermore, the result measurement for PSD and CWT are obtained the 
same pattern with STFT and attempt to performed a high accuracy result. 


Performance measure of multimodal 


==@ = Accuracy 


O «+ Sensitivity 


mmm@mm Specificity 


Percentage of multimodal performance 


Resting state M1 M2 


Figure 5. The result for accuracy, sensitivity, and specificity in percentage 


In addition, the AUC, F1 score and informedness result make up a contribution towards the solution 
to determine the parameter value in multimodal reported in Table 4. The AUC is to determine the precise 
integration under to function of features results. The highest score for this AUC is | in this paper shows the 
perfect score and the nearest value to | is a more reliable performance [21]. The Fl measurement provided 
proof that the solution enhanced the multimodal when the result shows high value because Fl measurement 
has the same task as the accuracy function. An informedness measure the result in the range of -1 to 1 with 
the value of | is correct, -1 is incorrect and 0 is being changed [31]. The informedness values show the result 
in range 0.793 to 0.812 were described that the value is presenting a closely correct. 
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4. 


Table 4. Additional performance measurement of multimodal 


Method PSD PSD PSD CWT CWT CWT STFT  STFT STFT 

(RS) (M1) (M2) (RS) (M1) (M2)_—RS) (M1) (M2) 

AUC 0.982 0.999 0.969 1 1 0.991 0.991 1 0.986 

Fl SCORE 0.982 0.986 0.901 1 1 0.986 0.958 1 0.958 

Y (informedness) 0.795 0.794 0.807__0.799__-0.789_—-0.797_—0.801__—0.797 0.812 
CONCLUSION 


Overall, it is permissible to apply a conceptually comparable technique from the previous in 


addition to the approach of combining physiological measures and a self-assessment stress questionnaire in 
the multimodal parameter. The outcome of the study contribution shown that the highest accuracy is 97.68% 
and informedness is 5.7 while AUC and FI score produce | as a result. The result of sensitivity and 
specificity produces 95.81% and 89.5%. Based from the result finding, the proposed parameter and method 
used capable of performing a good result using EEG signal based on music application in multimodal 
function for thirty-four subjects. In this study, the stress level is identified as an area that needs to be 
investigated more in order to learn more about human stress. 
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